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L Introduction 

limited proteolysis is responsible for activating a 
wide range of proteins from immature forms and is 
h nee implicated in a number of biologically impor- 
tant systems [1-3]. Proteases have also been used 
widely throughout the field of biochemistry in many 
areas of. study including sequencing, enzyme 
(in)activadon and complete degradation [4,5]. Often 
these hydrolytic reactions take place under denaturing 
conditions, so thai every peptide bond is cut where 
the local amino acid sequence satisfies the specificity 
requirements of the protease in question. This appli~ 
cation is useful for determination of amino acid 
sequence, such as the generation of a complete limit 
digest for mass spectromecric analysis. The protease 
trypsin is now routinely used to produce such a limit 
"map" for protein identification- However, the de- 
naturation step precludes any information being 
gained on the three-dimensionaJ structure of the pro- 
tein. In order to retrieve structural information through 
proteolytic digestion it is necessary to limit the reac- 
tion in some way. leading to the notion of "limited" 
proteolysis. A classic example of a limited proteoly- 
sis is the cleavage of ribonuclcase A by subtilisin at a 
bond between residues 20 and 21. The product, ribo- 
nuclcasc S, is made up of the non-covalently associ- 
ated S-peptide and S-protein and is fully active [6]. 

Limitation of the proteolytic reaction can be 
achieved in a number of ways, such as by alteration 
of the reaction conditions (temperature. pH, ionic 
strength), or more typically, by ensuring the substrate 
protein is in a nadve (or near native) state. If this last 
condition is satisfied, limited proteolysis is a power- 
ful tool for probing the higher order structure of 
proteins; providing information on the location of 
particular peptide bonds with respect to the overall 
fold of the protein. The premise underpinning-such 
studies is the sequence/structure paradigm of limited 
proteolysis: that higher order structure and not pri- 
mary sequence is the main determinant of the site of 
initial hydrolysis. For example, trypsin cleaves pep- 
tide bonds C-terminal to basic amino acids, lysine 
and arginine. Assuming that 1 in 10 of all amino 
acids in a protein sequence are lysine or arginine, 
10% of all peptide bonds are in principle accessible 
to proteolytic attack and subsequent cleavage by this 
enzyme. However, under non-denaturing conditions 



this is rarely the case and typically only a few, one or 
sometimes even none, of the putative bonds are cut in 
the time scale of a typical experiment The reasons 
for this are obviously structural. Peptide units buried 
in the protein core and in regular secondary structural 
elements ajc less accessible to the enzyme and are 
therefore not cleaved as quickly. Hence, a priori, it 
would be expected that the observed cleavages occur 
at broadly 4 * surface 1 1 sites, that are exposed to the 
surrounding solvent, such as loops. This assumption, 
borne out by experimental observation, has been 
widely exploited to infer the accessibility of sites in 
proteins of unknown structure. The same premise 
applies to the identification of domain linking seg- 
ments which arc similarly expected to be surface 
exposed and readily cleaved by proteinases. Knowl- 
edge of the surface amino acid residues is clearly of 
great benefit for understanding protein function as it 
can inform on likely epitopes and receptor-interaction 
sites. This general approach has also now been ex* 
tended to the study of near-native partly unfolded 
States, to gain information on protein folding and its 
pathways [7]. 

Although this technique provides a simple, cheap 
and effective method for detecting exposed surface 
sites, there remain caveats to the model. Specifically, 
is it really only exposure that dictates the site of 
cleavage for a protease of broad specificity? A num- 
ber of research groups, including our own, have 
studied limited proteolysis as a molecular recognition 
system to provide a more precise understanding of 
the structural determinants of thiB process. Factors 
other than surface exposure must also act as key 
determinants of vulnerability of proteolysis. Clearly, 
a more detailed understanding of the molecular events 
involved in the recognition process is required in 
order to use limited proteolysis as a thoroughly reli- 
able probe of protein structural analysis. 

This review will focus on limited proteolysis as a 
tool for the investigation of protein structure, using 
recent exemplary studies to highlight its utility, as 
well as pointing out the potential pitfalls. The under- 
lying structural determinants will be discussed by 
reference to a number of recent studies on proteins of 
known structure, as well as a statistical survey of the 
structural properties of limited proteolytic sites 
(nicksites) and attempts to predict them a priori from 
structure and sequence. Finally, future directions for 
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this research area will be address d, including die use 
of limited proteolysis in the study of protein folding 
and potential biot ch applications. 

2. Th sequence-structure paradigm of limited 
proteolysis 

The basic paradigm of limited proteolysis has al- 
ready been described: namely, the masking of pri- 
mary specificity by tertiary structure. It is worth 
briefly describing the origins of primary specificity. 
The earliest systems to be structurally characterised 
were the serine proteinases chymotrypsin and trypsin./ 
[8 9]. Their X-ray crystal structure determinations 
allowed the known amino acid sequence specificities 
to be rationalised in terms of subsiie binding pockets 
located at the enzyme active site cleft [10]. The 
subsites are classified from the notation of Schechter 
and Berger til] by S B which corresponds to the side 
chain of the substrate polypeptide P B which binds into 
it. The side chains are numbered from P t on the 
arnino-tcrmmal side of the scissile peptide and ?[ on 
the curboxy-terminus. Hence the primary specificity 
of the enzyme is usually defined in terms of the P l 
side chain preference for the S, subsita. In trypsin, a 
deep S. site is formed mat possesses an acidic aspar- 
tic acid (ASP189) at the pocket base. This gives 
trypsin a strong preference for the basic amino acids 
' lysine and arginine at P^ In fact, under normal 
conditions trypsin will cleave at no other bond. In 
chymotrypsin, the equivalent position to this acidic 
side chain is substituted by a neutral residue, serine, 
and the hydrophobic properties of the pocket become 
dominant. This leads to a preference for hydrophobic 
(and preferably) aromatic side chains. In elastase, the 
pocket is partly filled by the mutation of two glycine 
residues in trypsin and chymotrypsin at Gly214 and 
Gly226 to valine and threonine, respectively, leading 
to a hindrance of all but the small amino acids such 
as alanine and serine at the P x position. Recently, 
Laskowski and co-workers have attempted to quan- 
tify the effect of all of the 20 commonly occurring 
amino acids at P, for a core of six well known serine 
proteases [12]. This was achieved by recombinant 
techniques on the ovomucoid inhibitor from turkey. 
Binding studies yielded kinetic data which were con- 
vened for free energies to give a quantitative scale 



for all the amino acids at the primary recognition 
position for these six enzymes. In general, the scales 
were unsurprising and might have been predicted, at 
least qualitatively. The hydrophobic interaction was 
observed to be the dominant force, moderated by size 
and shape effects and polarity. However, it was also 
clear that minor alterations in the pocket geometry 
could have a large effect on the overall preference .of 
even very closely related enzymes such as human 
leucocyte and pancreatic elastase [12,13]. 

In addition to the primary specificity, some en- 
zymes exhibit a secondary specificity at sites re- 
moved from the scissile bond. SubtUisuv shows a 
strong preference for hydrophobic side chains at the 
P 4 position [14]. In subtilisin BPN' the amino acids 
Tyrl04, Ilel07 and Leul26 create a hydrophobic 
pocket that demonstrates a preference for the follow- 
ing amino acid side chains ai this position: Phe > Leu, 
lie, Val> Ala [15]. Other secondary subsites have 
been discovered in otheT proteinases although their 
effects on binding and catalysis are usually small. 

Despite these primary and secondary subsite pref- 
erences, subtilisin will readily cleave after almost any 
amino acid. In some cases, the subsiie specificity of 
the enzyme therefore becomes almost irrelevant. In 
the context of limited proteolysis, the proteinases 
may be classed into two categories: those with very 
narrow specificity requirements such as trypsin, V8- 
proteinase and endoprotcinase Arg-C, and those with 
relatively broad specificity requirements such as sub- 
tilisin, thermolysin and proteinase K. Although se- 
quence preferences must play a role in dictating the 
she of ultimate hydrolysis in the folded protein, the 
influence of tertiary structure is expected to be more 
pronounced for the broader specificity proteinase. 

3. Experimental considerations 

Reaction conditions for limited proteolysis are typ- 
ically chosen to ensure that complete degradation of 
the substrate protein does not take place. In order to 
limit the digestion, a number of methods are typically 
employed. Often, the euzymeisubstrate ratio is re- 
stricted to somewhere between 1:50 and 1:1000 so 
that proteolysis is incomplete and intermediates may 
be observed accumulating over time. Additional tech- 
niques include performing the digest at low tempera- 
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ture and non-optimal pH for the enzyme. However, 
some enzyme-substrate systems require no retarda- 
tion as the protein substrate is strongly resistant to 
proteolysis; In these instances, it may be necessary to 
use higher enzyme concentrations, elevated tempera- 
tures, extended digest times or even low concentra- 
tions of denaturing agents. In the latter case, it is 
important from the point of view of structural infer- 
ence, that the tertiary fold of the protein has not been 
greatly distorted No significant loss of function or 
activity should be observed and the protein should 
possess the same structural properties. Generally, 
th re are no hard and fast rules governing 
enzymersubstraie ratios and optimal experimental 
conditions to study limited proteolysis must be found 
out by preliminary experimentation. 

Most proteolytic reactions arc monitored via SDS- 
PAGE, a simple and cheap technique available to 
most laboratories. However, gel electrophoresis of 
proteinase digests will rarely yield the precise site of 
limited proteolysis unless the proteinase is particu- 
larly narrow in its primary specificity and the amino 
sequence fortuitously disposed. To determine the ex- 
act site(s) of proteolysis, further studies are required 
such as Edman degradation sequencing chemistry. 
Mass spectrometry methods arc also being used with 
increasing frequency due to the high mass accuracy 
they are able to yield, particularly with low sample 
quantities [16,17]. 

4. Structural determinants of limited proteolysis - 

Early studies on limited proteolysis considered the 
reaction in the context of the compact, globular struc- 
ture of folded, proteins and Understrom-Lang pro- 
posed two pathways for degradation dependant on the 
stability of the protein [18]. Trie initial nicking-may 
destroy the stability of the protein, which subse- 
quendy unfolds, exposing all peptides to attack and 
the protein is degraded to completion. In the second 
case, the protein retains its overall structure and 
remains folded, preventing general degradation from 
taking place. In the light of subsequent high resolu- 
tion X-ray crystal structure determinations of known 
substrate proteins, Ncurath proposed the original hy- 
pothesis that limited proteolysis occurs at 1 'hinges 
and fringes" such as exposed surface loops and 



domain linking segments [4]. These theories were 
further expanded by other workers who attempted to 
quantify the contributions of exposure and flexibility 
to limited proteolysis. One study considered the 
known autolytic and subtilisin cleavage sites in the 
protein thcrmolysin, itself a proteinase [l9 f 20l A plot 
of the residue-averaged atomic temperature factors of 
thermolysin obtained from a crystallographic deter- 
mination demonstrated that the limited proteolytic 
sites exhibited a remarkable correlation with the peaks 
in the profile. Temperature factors are in pan derived 
from the thermal motions thai occur in the crystalline 
protein molecule and represent a useful measure of 
the segmental mobility of the substrate amino acid 
chain. Hence, the limited proteolytic sites in ther- 
molysin occurred in flexible regions of the molecule. 
An example of this correlation is shown in Fig. 1 for 
the protein staphylococcal nuclease which is cleaved 
by trypsin at Lys48-Lys49 and Lys49-Oly50 in 
non-denaturing conditions [21,22]. The profile shows 
that these two trypsin nicksitcs are situated at th 
largest peaks and arc therefore the most flexible 
putative tiyptic sites (as characterised by temperature 
factors). 

An opposing hypothesis to these flexibility theo- 
ries was obtained by using a large spherical probe as 
a model of a proteinase and calculating the accessible 
surface when rolled around the surface of three pro- 
tein structures [23]. This data suggested that surface 
exposure and not flexibility was the prime determi- 
nant of limited proteolysis. However, subsequent 
studies of an increased number of proteins have not 




~5 T too lio iST 
residue number 



Fig. 1. Temperature factor profile for Staphylococcal nuclease. 
Mean residue temperature factors (averaged over all backbone 
atoms) are plotted al ng the amino acid sequence of the protein. 
All putative trypsin nicksto arc indicated by filled circles on the 
profile and the actual ruckaiusfi out by the enzyme are indicated 
by arrows. 
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reproduced these findings. Fontana, when reviewing 
the thennolysin data [24,25], expanded the considera- 
tions to other systems and concluded again that seg- 
mental mobility of the proxein chain is the overriding 
factor in determining susceptibility. 

Other studies have considered the precise confor- 
maiion of the substrate regions of nicked proteins 
more carefully [26]. At some stage prior to cleavage, 
the substrate peptide must closely match the structure 
of the "idealised substrate" conformation exhibited 
by the canonical binding-loop conformation observed 
in the families of small protein inhibitors. When 
compared one to another, the protein inhibitor bind- 
ing loops were seen to possess practically Identical... 
backbone conformations spanning the P 2 -I^ region 
[26,27]. A superposition of some representative serine 
proteinase inhibitor binding loops is shown in Fig. 
2(a) depicting the high degree of structural backbone 
conservation. In Fig. 2(b), the conserved interactions 
made by these inhibitor structures to the enzyme are 
schematised using the pancreatic secretory trypsin 
inhibitor (PSlD/trypsin complex as representative. 
The peptide carbonyl carbon is approached by the 
attacking catalytic serine of the proteinase, whilst the 
P i side chain binds into the S , pocket. The carbonyl 
oxygen is bound by the amide groups of enzyme 
residues Glyl93 and Scrl95 which form the so-called 
oxyanion binding pocket. These interactions, or their 
equivalents, would be expected to be made prior to 
cleavage for all serine protease substrates. 

Tryptic limited proteolytic sites possess totally dif- 
ferent structures to the conserved inhibitor template, 
and were also different to each other [26]. Despite 
this, the nicksites were also generally well correlated 
with accessibility, protrusion and flexibility (as char- 
acterised by X-ray ciystallographic temperature fac- 
tors). However, from inspection and simple mod- 
Uing experiments, it was clear that a structural rear- 
rangement would be required primarily a local un- 
folding step, in order for proteolysis to take place. 
This conclusion was confirmed by a series of loop 
modelling experiments designed to test this theory 
and attempt to quantify the degree of local unfolding 
required [28]. Plausible substrate models for known 
proteolytically cleaved loops were only achieved 
when upwards of 10 amino acids were allowed to 
unfold locally- Similarly, the modelling protocol 
demonstrated that fi-strand was unsuited to hosting a 
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limited proteolytic site 2dthough they were plausible 
within a-hclices. This is illustrated in Fig. 3, where a 
loop containing the required template conformation 
from P 2 -P£ may he introduced into a helical segment 
but not in a beta (extended region). The rationale for 
this is that too many inter-strand hydrogen bonds 
would need to be broken and the ends brought closer 
together to deform an extended chain segment such 
as found in p-sheets. The deformation of extended 
segments such as (3-strands is necessary to allow the 
nicksite.iegion to fit into the enzyme active site 
Without introducing a large number of unfavourable 
contacts with the lip of the cleft and the rest of the 
enzyme. Although there are less geometrical restric- 
tions within a-helices, similar energetic constraints 
are believed to disfavour the location of nicksites in 
the centre of helices [7]. Indeed, this disposition of 
nicksites to occur in regions of non-secondary struc- 
ture is widely observed. 

In addition to the modelling of known nicksites, all 
the putative tryptic sites in elastase were passed 
through the same modelling procedure. This pro- 
duced the interesting result that the most easily mod- 
elled site was the true nicksite at Argl25-Alal26 
[29]- Successful models could only be produced for a 
handful of the possible sites. A rt-modelled loop was 
deemed acceptable if It contained the requisite con- 
formation from P 3 -Pi, did not make unfavourable 
interactions with the rest of the protein bulk and 
could be "docked" into the active site of the enzyme 
without introducing a large number of unfavourable 
intcrmolecular contacts. Of the 4 or 5 sites where this 
was possible, the sites were also considered by the 
number of hydrogen bonds each loop made to the rest 
of the protein and the amount of surface area they 
buried. The true nicksite was found to make the 
fewest hydrogen bonds with the rest of the protein 
and buried relatively little surface. Thus, the most 
susceptible site was characterised by making the 
fewest interactions with the rest of the protein and is 
thus the most readily locally unfolded. 

Although conformational parameters such as ac- 
cessibility, segmental mobility and protrusion corre- 
late quite well with limited proteolytic sites, these 
parameters are themselves highly correlated [30], As 
pointed out by Fontana [7], this in itself is not enough 
to rationalise the limited proteolysis phenomenon, as 
it is not clear which of these factors are important 
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Local unfolding is possible 



' Local unfolding is impossible 
without major deformation 



Fig 3 Secondary structural models for local unfolding. The two regular secondary structure states, a-helix and p-atrand, are depicted as 
MOLSCRIPT [77] canocm. Loop closure modelling experiments were able to show that local unfolding could occur with regular helix 
although it was not geometrically possible within extended p-structure without gross deformation of the protein structure. 



and which merely correlated. Indeed, even when a 
protein structure is known to high resolution atomic 
detail and its dynamic properties characterised, it is 
not always possible to explain proteolysis results. 
What remains unchallenged is the need for local 
unfolding to take place, and it seems most likely that 
the regions of protein structure most able to do this 
will be those characterised by the conformational 
parameters discussed here. 

5. Limited proteolysis as a structural probe 

The incomplete understanding of the factors gov- 
erning limited proteolysis notwithstanding, it is still 



widely used as a probe for protein structure [31], This 
is based on the early hypotheses thai limited proteol- 
ysis occurs exclusively at "hinges and fringes" [4]. 
In the case of native globular protein structures the 
reaction is expected to occur primarily at flexible 
surface loops and in the case of multi-domain pro- 
teins at similarly mobile linker regions between the 
domains. The hydrolysis typically yields a nicked 
species that retains its overall fold under non-denatur- 
ing conditions for most single domain proteins and 
potentially separates the individual domains for 
muld-domain proteins. Hence limited proteolysis may 
be used to monitor protein surface regions, ligand-in- 
duced conformational changes, domain boundaries 
and protein unfolding/refolding. Some examples Of 



"p|g 1 Stereochemistry at the recognition regions during limited proteolysis. In A, a MOLSCRIPT [77] carton of the structural 
auperpositlon of the binding loops from P^-H of five serine protease inhibitors Is shown. Only the backbone atoms are ahown for cianty 
ananfrom the F, Side chain of lysine 15 from BFTL The inhibitor binding loops show remarkable structural conservation from P a -Pi but 
begin to diverge outwards from this point. In B, a LIGPLOT [78] schematic of the interactions made by the Inhibitor PSTI to trypanogen 
(databank code ITGS) are shown which are a common feature of this recognition ayatem. The catalytic serine 195 approaches the 
carbonyl carbon of the peptide bond under attack while the mainchain amides of residues 193 and 195 forming an oxyamon bmolng 
pocket for the peptide carbonyl oxygen. Anti-parallel beta sheet Interactions are made from residues (214 and 216 in this eyatem) to tne 
inhibitor backbone on the non-prime side. 
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the applications of limited proteolysis to the under- 
standing of basic structural data concerning exposure 
and domain organisation and function will be pre- 
sented here to illustrate the differing utility and limi- 
tations of these approaches. Applications of the tech- 
nique to ligand-effects and folding/stability will be 
covered in the following sections. 

Por proteins of unknown structure, limited proteol- 
ysis provides a simple method to gain insights into its 
tertiary fold. Often the results from such biochemical 
analyses can be employed to complement modelling 
studies as was done for the lipase from Pseudomonas 
aeruginosa [32]. In this instance, cleavage at Asp38- 
GW39 and Glu46-Val47 by 5. aureus proteinase V8 
confirmed that these residues were surface exposed 
and this information was incorporated into a 3-dl- 
mensional model of the protein. Indeed, the multiple 
sequence alignment of this lipase to a relative of 
known structure positioned these residues at surface 
exposed positions. In other studies, proteolysis by a 
range of proteinases was employed to gain insights 
into the structure of the human estrogen receptor 
ligand binding domain [33]. Extensive limited prote- 
olytic experiments isolated the core binding domain 
to He between residues Asn304-Lys529. and sug- 
gested that the C-terminal domain from 530-553 is 
roost likely surface exposed. 

Proteolysis has also been used as a probe of the 
structure and dynamics of hirudin from Hirudinana 
manillensis [34]. The 3-dimensional structure of the 
relative from Hirudo medicinalis was used to help 
interpret the results from limited proteolysis with V8 
proteinase, trypsin, thennolysin and subtilisin, all of 
which cleave in. the region of 41-49. This data 
demonstrates that like the homologue from H. medic- 
inalis, hirudin from H. manillensis possesses a well 
structured N-terminal core dpmain" and a flexible 
C-terminal loop that is readily cleaved by United 

proteolysis. , 
The understanding of the inhibitory mechanism of 
serpins, a family of serine proteinase inhibitors that 
• control the proteolytic pathways of blood coagula- 
tion, fibrinolysis and inflammation has also been 
advanced by experimentation using limited proteoly- 
sis [35,36]. Serpins ar present in the blood in a 
number of circulatory forms, of which one latent 
form involves a partial insertion of the reactive centre 
loop into a 0-sheet. This loop insertion partly pro- 



tects the loop against cleavage at the hinge region 
[35]. By forming a binary complex with synthetic 
peptides, Carrell and co-workers [36] were able to 
show that the limited proteolytic susceptibility of the 
reactive site loop could be modified. This provided 
further evidence for the loop insertion model of the 
latent forms of the serpins with the loop adopting a 
helical conformation similar to that found in the 
non-inhibitory ovalalbumin [37]. 

Structural information on membrane proteins may 
also be gained via limited proteolysis experiments, 
particularly concerning the protein topology with re- 
spect to the membrane. This approach relics on the 
premise that proteases of low specificity will com- 
pletely degrade the portions of the protein on the 
cytoplasmic or pcriplasmic side of the membrane, 
and is well reviewed by Piatt [38]. A recent example 
of this approach is exemplified by the application of 
limited proteolysis to the melibiose permease from 
E. coli [39]. Here, not only were transmembrane-con- 
necting loops detected but evidence for their specific 
roles in substrate binding and catalysis was inferred 
from differential proteolytic susceptibilities induced 
by the presence of melibiose and Na + and Li . 

In addition to exploration of surface exposure, 
limited proteolysis has been widely used to identify 
and separate individual functional domains within 
multi-domain proteins [4,5]. Assuming that prote- 
olytic nicking occurs at interdomain linking regions, 
to determine which domain is responsible for which 
given function is simply a matter of isolating the 
products and assaying for activity. This approach 
depends on the exact domain nature of the protein in 
question and may not be universally applicable par- 
ticularly when the domain structure is complex and 
formed by the amino acid chain crossing between 
domains several times. Some recent examples are 
presented here. 

The approach is exemplified by studies on the 
domain structure of a DNA methyltransferase restric- 
tion enzyme [40]. Digests performed with trypsin and 
chymotrypsin have helped identify the regions re- 
sponsible for specificity, methylation and restriction 
in this multi-subunit enzyme. The trypsin cleaved 
enzyme contains two methylation domains and one 
nicked specificity domain. Similarly, the domain 
structure of chicken liver xanthine dehydrogenase 
was predicted by digest with subtilisin [41]. By N- 
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terminal sequencing of the products, examination of 
previous sequence analyses and performing activity 
assays, the authors were able co assign a three domain 
structure to the protein whereby the 20, 37 and 
84kDa fragments were demonstrat d to contain the 
iron-sulphur, FAD and molybdenum centres respec- 
tively. . 

A further example of functional assignment is 
given by proteolysis experiments on pancreatic lipase 
[421 The enzyme hydrolyses triglycerides only in the 
presence of colipase. Limited chymotryptic digestion 
of the porcine and human protein yielded a stable 
12kDa C-terminal domain that was inactive but was 
able to bind the co-factor. The N-terminal section, 
was fully degraded. Conversely, digestion of the 
horse enzyme produces a stable 45kDa N-termmal 
fragment that was active but could not properly bind 
the colipase. Thus limited proteolysis has helped 
localise the two distinct roles of the enzyme ui 
performing it's function and putatively assign the 
domains. It is interesting to note here that the same 
enzyme from different (but closely related) species 
can possess quite different limited proteolytic suscep- 
tibility properties. In the pig and human enzyme, the 
C-ierminal domain remains stable against limited 
chymotryptic attack whilst in the horse enzyme it is 
the N-tcnninal domain. It is quite possible that the 
ultimate route of digestion is altered by only one or 
two amino acid changes altering the relative suscepti- 
• bility of only a few bonds. The resulting proteolytic 
degradation pathway is altered in favour of one or the 
other of the two domains being hydrolysed in the 
different species. 

There are many similar examples in the literature, 
too numerous to mention here, where limited prote- 
olytic fragmentation of a large multi-domain protein 
has led to an understanding of the different functions 
of th individual domains. This approach promises to 
retain its utility now that the identification of high 
resolution masses has become more routine due to 
recent advances in mass spectrometry. However, as 
stated previously, there remain some caveats. Do- 
mains are not always formed by contiguous stretches 
of protein and the concept of "mobile hinges" be- 
comes less applicable. Furthermore, in some in- 
stances, loops situated away from domain linking 
regions may additionally be protcolyscd leading to 
potential misinterpretations. Notwithstanding, limited 



proteolysis has been widely used to isolate domains 
not only to identify function, but also to create smaller 
units for structural analysis by X-ray crystallography 
or, more increasingly, by NMR. 

6, tigand effects on limited proteolysis 

The presence or absence of a bound ligand can 
profoundly affect the susceptibility of a protein seg- 
ment to limited proteolysis although the mechanism 
by which* these effects., are achieved is not always 
clear. Indeed, the changes produced may be remote 
from the site of Interaction of the protein with its 
substrate or co-factor. A number of exemplary cases 
will be presented here. s 

The effects that Ca 2+ -binding can have on prote- 
olytic susceptibility have been known for some time. 
Indeed, addition of mM concentrations of calcium 
salts to reaction buffers has long been recognised as a 
technique to inhibit autolytic degradation of trypsin 
itself [43]. Presumably calcium is bound to a mobile 
loop segment, acting as a tether, and reducing the 
ability of the loop to deform and enter the enzyme s 
active Site. Similarly, calcium ion concentration alters 
the proteolytic susceptibility of the calcium binding 
protein calmodulin to endoproteinaae Glu-C. The sus- 
ceptibility of two sites in particular has been moni- 
tored and yielded surprising results [44]. A putative 
cleavage site at Glu87-Ala88 lies close to calcium 
binding site in and although susceptible in the apo- 
form. this Site is fully protected against cleavage 
when calcium is bound at this site under the same 
reaction conditions. More surprising, is the change in 
susceptibility of Glu31-Leu32 which is situated in 
calcium binding site I and is resistant to proteolysis 
in the apo-protein. Susceptibility was enhanced at this 
bond when calcium was bound at sites III and IV, 
with free calcium concentrations in the p.M range. 
However, at concentrations above 100 mM and in 
otherwise the same conditions, full protection against 
attack was returned when calcium was presumably 
boiind at site I and possibly also site II, These 
proteolytic "footprinting" experiments- demonstrate 
that calmodulin must occupy at least thie.e conforma- 
tional substrates dependent on its calcium-bound state. 

Calcium-binding loops serve as a typical model for 
ligand-induced effects on proteolytic susceptibility. 
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Recent work grafted a calcium-binding loop onto an 
existing protein Via protein engineering and tested the 
subsequent proteolytic properties of the mutant pro- 
tein 55]- A surface loop in the neutral protease from 
Bacillus subtilis was replaced by a longer loop from 
the homologous enzyme thermolysin which is already 
known to bind calcium. The mutant enzyme was 
demonstrated to be able to bind calcium, albert 
weakly. This was sufficient however, to dimmish 
autolysis! of the neutral protease which also demon- 
strated increased kinetic thermal stability in solutions 
containing 0.1 M CaCl 2 . This strongly suggests that 
the engineered loop is protected from autolytic degra- 
dation by calcium binding. 

Another recently studied ligand-binding system is 
the cellular retinoid-binding proteins which are inher- 
ently more susceptible to tryptic cleavage at a single 
rite in the apo form [46]. A model was proposed 
whereby the Ugand binding site is "capped" by the 
movement of a helical segment after ligand binding. 
The susceptible site (Arg30-Lys31) is contained in 
ibis region. The extra interactions between the largely 
hydrophobic ligand and the protein anchor the loop to 
the protein bulk and restrict local segmental mobibty. 

Hydrophobic interactions are not the only forces 
that tether loops. Recent studies on the avidin-biotin 
system highlighted electrostatic interactions between 
a loop and a bound ligand which accomplish a simi- 
lar result [47]. The loop between strands 3 and 4 of 
the avidin calyx (3-4 loop) makes a pair of hydrogen 
bonds with one of the valerate oxygens of the bioun 
ligand in the bolo-form of the protein. The hgand- 
bound form is entirely refractive to proteolysis under 
the same conditions. These interactions are obviously 
lost in the apo-form which is cleaved by proteinase K 
specifically at two sites, Thr40-Ser41 and Asn42- 
Glu43. within tne 3-4 loop. Although this result is 
easily rationalised, a more surprising effect on ptete- 
olydc susceptibility was observed with an alternative 
liEand. The chromatogenic reporter 4'-hydroazoben- 
zene-2-bcnzoic acid (HABA) binds to the same site 
as biotin and in the same mode; the X-ray crystal 
structure of this complex is known [48]. However, 
HABA lacks a corresponding polar group to valerate 
in biotin and the hydrogen bonds with the loop are 
• lost. Mosi surprisingly, the rate of proteolysis of th>s 
loop by proteinase K is enhanced, almost by an order 
of magnitude, with respect to hydrolysis of the wild- 



type apo-enzyme. This result raises some important 
issues concerning loop dynamics and proteolysis. 
How does a lack of tethering increase proteolytic 
susceptibility? Perhaps the HABA ligand induces the 
loop 10 protrude more from the protein surface in- 
creasing its likelihood of being accessed by the at- 
tacking enzyme? Similarly, the reduced interactions 
between the loop and the protein might make the 
loop more flexible allowing it to take up more con- 
formations. This would increase the frequency with 
which the loop passes through a conformation accept- 
able to the active site of the proteinase. Of course, all 
this speculation ignores the enzyme itself. It is quite 
possible that the enzyme may also exhibit some 
inductive effects on loop dynamics. These questions 
remain unanswered, not least because the 3-4 loop in 
the crystal structure of the HABA holo-form is disor- 
dered and invisible to the crystallographers [48]. Nev- 
ertheless, this system shows how proteolytic suscepti- 
bility can be modified by different ligands in quite 
different ways. 

7. Limited proteolysis as a probe of unfolding/ 
refolding 

Although proteolytic susceptibility is not the sole 
determinant of protein thermal stability the two are 
undoubtedly linked [49]. The general observation, 
based on me early experiments of Understrom-Lang 
[18], underpins much of the experimentation earned 
out on protein- (urOfolding using proteases as struc- 
tural probes. When a protein is destabilised, either by 
heating or by chemical denaturation, the resulting 
unfolding must increase the inherent susceptibility of 
the protein to proteolytic attack. If the unfolding is 
global, then the proteolysis will be general and^ the 
protein is degraded to completion. This is an all- 
or-nothing" proteolytic event; either the protein is 
degraded to completion or not at all. However, if one 
segment of the protein is unfolded locally without 
substantially affecting the overall fold, it may be cut 
in a limited manner and the nicked protein is stable 
and remains resistant to further degradation. In this 
instance, nicked products will accumulate. If, how- 
ever, the nicked form is considerably less stable, then 
global unfolding may occur via this intermediate 
leading to complete degradation. These processes are 
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Pig 4/ Unfolding and proteolysis scheme for protein structural 
studies. For unfolding studios to be successful the protein stabil- 
ity at a given temperature should favour local unfolding above 
global unfolding (AO lOMl <AO fald ) otherwise complete protein 
degradation occurs when the temperature increases. Sirnllariy. If 
ihe nicked species is not destabilised significantly UO foW - 
AGnJ then the proteolysis produce may be ob served . If the 
nicked species is significandy destabilised, globally unfolding 
occurs and the protein is degraded to completion. The nanve 
folded protein should ideally be refractive to proteolysis under 
"native" conditions, so that partly folded intermediates may be 
induced via headng or chemical denaturante. 



summarised in Fig. 4. For this lype of experiment to 
be successful, the protease should not readily cleave 
the native protein (or sub-domain) and the rate of 
proteolysis should be much faster than the rate of 
(local) unfolding. Similarly, the amount of energy 
required for global unfolding should be much larger 
than for local. If these conditions are satisfied, at 
elevated temperatures, local unfolding of some seg- 
ments may be induced and can then be studied using 
proteases as probes of these panly-(un)foldcd states. 
Given that protein unfolding is simply the reverse 
process of protein folding, inferences may be made 
on the latter stages of folding. ; • ■ 

The autolytJc degradation of theimolyain-like pro- 
teases represents a classic example of the' thermal 
unfolding model, where in this case the protein 15 
also the attacking protease [50,51]. At elevated tem- 
peratures thes proteins become irreversibly inacti- 



vated due to proteolysis caused by partial (local) 
unfolding processes [50]- The thermal stability (and 
h ncc proteolytic stability) of the protein is largely 
due to a small surface region and the enzyme can be 
stabilised by a few mutations in this putative au- 
tolytic site [52]. 

The unfolding of ribonucleasc has also been probed 
by proteinases, using both heating and chemical de- 
naturams to induce pardy folded states [53-56]. Early 
studies monitored the degradation products of the 
proiein across a temperature range from 20°C to 65 C 
digesting with chymotiypsin [53] and trypsin [34J. 
The region between residues 17 and 23 is labile 
above 35°C since chymotrypsin selectively cleaves at 
Tyr25-Cys26 although the native protein is resistant 
to chymotrypsin at room temperature. Similarly, 
trypsin was able to cleave at Lys31-Ser32 and 
Arg33-Asn34 after heating. Some regions of the 
protein were more stable: no cleavage was observed 
between Asn62-Gln74, Ile81-Thr87 and Ala96- 
Alal02 even at 60°C. More recently, the trypsin 
results were confirmed by Arnold and co-workers 
[55] who also found that thennolysin could cleave 
ribonuclease at Asn34-Leu35 and Thr45-Phe46 at 
and above 50°C. The results of all these studies 
suggest that ribonuclease folding elements have vary- 
ing levels of stability. The mobile loop around residue 
20 Is readily cleaved by most proteinases even at 
20°C. At slightly higher temperatures this region is 
extended out towards residue 25 where chymotrypsin 
may cleave. At higher temperatures still, the end of 
the helix from Asn24-Asn34 must be partly unfolded 
to allow cleavage by trypsin at this region, and must 
also destabilise the following p-strand to permit hy- 
drolysis by thennolysin at residue 45. Ribonuclease 
unfolding has also recendy been studied by denarura- 
lion with guanidinium chloride [56]. After 3h of 
digestion in 1 M guanidinium chloride trypsin cleaves 
at Arg33-Aen34, similarly to thermal denaturauon 
experiments, and additionally at ArglO-Glnll. 

The refolding of ribonuclease has also been stud- 
ied using urea as a. denaturing agent by monitoring 
tryptic digestion products over rime as the urea con- 
centration is lowered by dilution [57]. The putative 
sites at Lys31-Ser32 and Arg33-Asn34 remained 
susceptible until the latter stages of refolding. This is 
in agreement with the unfolding data previously dis- 
cussed which suggests this region is the last to fold. 
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A number of recent studies have focused on the 
limited proteolysis of proteins in the presence of 
trifluoroethanol (TFE) which is known to favour the 
formation of a-hclix [58-60]. In solution, tertiary 
structure is believed to be lost in favour of increased 
a-helical content as demonstrated by CD and protein 
NMR experiments [61] and hence the protein adopts 
a partially folded state. Thermolysin has usually been 
used due to its stability in TFE. When ribonucleasc A 
is incubated In 50% TFE, thermolysin primarily cuts 
at Asn34-Leu35 although the native protein is resis- 
tant to proteolysis [58]- Similarly, normally resistant 
lysozyme is selectively nicked at Lys97-Ile98 by 
thermolysin in 50% TFE [59]. Conversely, horse 
heart cytochrome c is fully degraded by thermolysin 
to many small peptides in aqueous buffer whilst only 
limited nicking occurs in 50% TFE at the Gly56-Ile57 
bond [60]. These experiments have demonstrated the 
Utility of combining proteolysis and TFE to study the 
partially folded states and complement other biophys- 
ical techniques in the study of protein folding. 

a-lactalbumin can be unfolded by acid denaturant 
and by removal of the single bound calcium with a 
chelating agent [62,63]. In both instances, a partially 
folded state similar in character to the 44 molten glob- 
ule" was induced [64]. This form demonstrated en- 
hanced susceptibility to pepsin, chymotrypsin and 
proteinase K at essentially the same region (Ala40 to 
Phc53) although the two domains either side of this 
5 gment were resistant. This suggests that despite its 
molien globule classification, these partly folded 
states appear to retain significant native-like structure 
in partially denaturing conditions. 

Further recent studies on apomyoglobin have 
shown that the F-helix becomes distorted in the apo- 
protein as it is -susceptible to a range of proteinases 
v/hilst the holo-protein (heme-bound) remains resis- 
tant to proteolysis under the, same conditions -£65]. 
Additional minor nicking of the apo-form is observed 
at the B-helix which is also believed to be somewhat 
mobile. The data strongly suggests that despite the 
limited proteolysis, the overall native state ..remains 
intact and the apo-protein cannot be described as Drue 
molten globule. Rather, partial unfolding of the F- 
helix (and to some extent the (3-heUx) is responsible 
for the increased dynamic properties of selected re- 
gions and subsequent pattern of proteolysis. This data 
i$ in agreement with previous computational and 



spectroscopic data on the folding of apo-myoglobin 
which suggest that the F-helix is the last to fold 
[66-68]. 



8. Prediction of limited proteolytic sites 

Limited proteolysis may also be expressed as a 
prediction problem. Given a protein structure or se- 
quence and an attacking enzyme, where will the first 
sites of hydrolysis be located? The ability to achieve 
either of these is clearly of great benefit. Knowledge 
of the most susceptible site will permit rational re- 
design via protein engineering to enhance or reduce 
proteolytic susceptibility, particularly if the structure 
is known. Similarly, successful prediction from se- 
quence will inform on the likely structure, assist 
modelling and provide testable hypotheses. However, 
the prediction of limited proteolysis has been ad- 
dressed by only a few groups, and has usually fo- 
cused on a single system of interest In a group of 2 S 
albumins from Brassica napus a strong sequence 
propensity to form a P-turn was noted at proteolytic 
processing sites [69]. This suggests an elementary 
prediction scheme for these proteins. A similar 
scheme was developed to predict the likely proc ss- 
ing sites in prohormonal proteins via a probabilistic 
scheme to predict potential ft-loop* [70]. These are 
regions containing no regular secondary structure, 
between 6 and 16 residues in length, and with an 
end-to-end distance less than 10 A. 

The above studies apply only to very limited sys- 
tems. Clearly a global approach to the problem is 
required and a useful first step would be the ability to 
predict the sites of initial proteolysis in proteins of 
known tertiary structure. To achieve this goal the 
conformational analysis of a set of tryptic nicksites in 
proteins of known structure [26] was expanded to 
include additional conformational parameters and sites 
cut by proteinases of broader specificity in order to 
derive a prediction algorithm. A description of these 
parameters and the reasons for their consideration is 
listed in Tabic 1- Preliminary analyses and modelling 
has already shown that these parameters are corre- 
lated with limited proteolysis [26,28,71], A thorough 
analysis revealed that no single conformational pa- 
rameter is a perfect indicator of limited proteolysis 
and that a uniformly successful prediction could only 
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Table 1 

Conformational parameters and their relationship CO limited proteolysis^ 
Reason for inclusion 



Parameter 



Method/technical details 

A probe rolled around the protein exterior aasigns 



Solvent accessibility 



Protrusion Index 



Temperature factors 



Ooi numbers 



Secondary structure 



Hydrogen bonding 



Nicksites already known to be generally at surface 
exposed sites 

Nickaitcs would be expected to protrude from the 
protein surface to enable accommodation Into the 
enzyme active site 

Nicksites are round at flexible regions of the 
protein, as characterised by temperature factors or 
B-*/aluas from X-ray crystallography determina- 
tions 

Nicksircs would be expected to be at weakly packed 
regions of the structure which ore able to locally 
unfold more easily 



Nicksites are rarely found in regular secondary 
Structure and there are geometrical and energetic 
reasons for their exclusion from helix and sheet 

Nicksites would be expected to be at regions of the 
structure which are not overly pinned down by 
interactions with the bulk of the protein such as 
hydrogen bonds 



areas In A a to each atom which are summed over 
each residue (73]. 

Each amino acid is assigned a score from 0 to 9 
depending on it's protrusion index calculated from a 
set of similar equimomental ellipsoids with origins 
at the protein centre of moss [74). 
The mean residue temperature factor Is calculated 
by* summing and averaging individual atomic val- 



Each amino acid is assigned an Ooi number score 
which is simply the number of,*a-carbon centres 
within a fixed radius [751 Scores are normalised and 
subtracted from unity so that higher scores are more 
favourable. 

Each amino acid is assigned to one of the three 
secondary structure states helix, strand or coil 176]. 
Each state is assigned a score, with an additional 
penalty 

The number- of non-local hydrogen bonds is calcu* 
lated from residues within a fixed window about 
each amino acid to those owtoide it, Values are 
normalised and subtracted from unity so thai higher 
scores are more favourable. 



be achieved when they arc all combined in a weighted 
prediction scheme [72]. Additionally, since limited 
proteolytic sites typically require 10 residues or more 
to unfold locally, smoothing windows were applied to 
the parameter values. A Monte Carlo optimisation 
procedure was implemented to produce the most 
favourable combination of weights and smoothing 
windows with which to combine the parameters. This 
was based on their ability to discriminate the true 
nicksite bonds from putative (but not cleaved) sites 
and all peptide bonds. A set of weights and smooth- 
ing windows were found that correctly predicted the 
first site of nicking for every protein in a data set cut 
by narrow specificity proteinases (trypsin, V8-pn> 
teinase, and endoproteinases Xrg-c and Lys-c)/ Some 
representative prediction profiles are illustrated in 
Fig. 5. The true sites of first proteolysis ax© invari- 



ably at peaks in the prediction profile and one of the 
nicksites is top scoring for each protein. The results 
were not as good for proteins where data was only 
available for broad specificity enzymes such as sub- 
dlisin and thermolysin. In these cases however, the 
nicksites were always near the top of peaks and in all 
but a few cases the top scoring residue was within 2 
or 3 amino acids of the first cut site. 

An overall conclusion from the analysis is that, 
although all arc important, the types of conforma- 
tional determinants of limited proteolysis can bfc bro- 
ken down into 3 main categories: ' V. , 
1 Flexibility. The ability to unfold locally is cnucal 
' for limited proteolysis to take place and ^regions 
with high. segmental mobility arc better placed to 
accomplish this. This is characterised by X-ray 
temperature factors. 
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Pig 5 limited proteolysis prediction profiles. Prediction profiles are illustrated for four example proteins, calmodulin (Brookbaven 
daiat>ai>K code [79], 1LIN), hemocyanin <1HCY), cellular wind-binding protein II (lOPA) end elastase (3EST). The true nicksites art 
indicated as filled circles on the profiles whilst puiarive sites matching the primary sequence specificity are indicated by empty cirlcca. 
The profiles were calculated Using the Nickpred program [72] using the optimiaed weights end emoothing window for nam^-^eciTicuy 
proteinases. ' 



2. Exposure. Although not an absolute necessity, 
sites near the surface will be able to undergo local 
unfolding more easily. Similarly, the conforma- 
tional change required at -surface sites is likely to 
be less since they are already partly protruding 
from the surface and hence arc more readily ac- 
commodated in to an enzyme's active site without 
causing intcnnolfccular steric hindrance. This is 
characterised by accessibility, protrusion and Ooi 
numbers. 

3. Local interactions. A good .candidate for local 
unfolding and ataptaUqrrmust not be unduly re- 
strained by interactions' such as hydrogen bonds, 
disulphide linkages and van dcr Waals interac- 
tions. This is characterised by hydrogen bonding, 
secondary structure and Ooi numbers, ! 

It is the requirement to unfold locally that appears 
to be the key determinant, which will be correlated to 
all of these parameters to a greater or lesser extent. 
Certainly, predictions are improved if the conforma- 
tional parameters are considered together in a 
weighted scheme rather than apart. 



9. Future directions 

Given. that prediction of limited proteolysis from 
structure is generally possible, the challenge remains, 
to generate a method able to accomplish this from 
sequence alone. However, the structural determinants 
of the molecular recognition process are still not 
completely determined. The precise degree of local 
upfblding required to facilitate limited proteolysis has 
yet to be determined. Similarly, the precise degree to 
which unfolding polypeptides must mimic the ide- 
alised substrate binding of proteinase protein in- 
hibitors needs to be quantified more precisely. With 
this information it may be possible to rationally 
design mutations that enhance or reduce susceptibil- 
ity to proteolytic attack and, by inference, the funda- 
mental stability of proteins. 

As yet a complete understanding of the dynamics 
of limited proteolysis is missing. Given the varied 
conditions under which limited proteolysis is coor 
ducted, little comparative data is available. Similarly, 
the majority of experimentalists view the limited 
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proteolytic process as a means to an end and do not 
calculate rate constants. However, the rate of limited 
proteolysis is an important factor as the process is 
undoubtedly not a binary "cut or not cut" problem. 
Indeed, in systems where the unfolding of the protein 
is much slower than the proteolysis the possibility of 
directly measuring rates for local unfolding exists. 
The understanding of protein folding has clearly ben- 
efited from the use of proteinases as structural probes 
and their continued use promises to reveal even more 
about the partially folded states of proteins. 

With a more complete understanding of the molec- 
ular recognition events surrounding limited proteoly- 
sis and an ability to predict changes upon alteration' 
of the substrate, rational redesign of substrates may 
be possible. This has obvious consequences for 
biotechnological uses; increasing the proteolytic sta- 
bility of a protein such as for therapeutic purposes, or 
conversely, reducing to give a protein a more limited 
life span. However, even with current understanding 
limited proteolysis is still a powerful, and perhaps 
undervalued, tool for extracting structural information 
about protein systems. 
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